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Let U be a given function defined on K and n(x) be a density 
function proportional to exp — U(x). The following diffusion X(t) is 
often used to sample from n(x), 

dX(t) = -VU(X(t)) dt + V2dW(t), X(0) = x . 

To accelerate the convergence, a family of diffusions with tt(x) as 
their common equilibrium is considered, 

dX(t) = (-VU{X(t)) + C(X(t))) dt + V2dW(t), X(0) = x . 

Let Lc be the corresponding infinitesimal generator. The spectral 
gap of Lc in L 2 (ir) (A(C)), and the convergence exponent of X(t) to 
7r in variational norm (p(C)), are used to describe the convergence 
rate, where 

A(C) = Sup{real part of ^i:^ is in the spectrum of Lc,fJ. is not zero}, 



p(C) = Inf{p: I \p(t,x,y)-n(y)\dy<g(x)e pt y 



Roughly speaking, Lc is a perturbation of the self-adjoint Lq by an 
antisymmetric operator C ■ V, where C is weighted divergence free. 
We prove that A(C) < A(0) and equality holds only in some rare 
situations. Furthermore, p(C) < A(C) and equality holds for C — 0. 
In other words, adding an extra drift, C(x), accelerates convergence. 
Related problems are also discussed. 

1. Introduction. In this paper we prove that by simply adding a weighted 
divergence-free drift to a reversible diffusion, the convergence to equilibrium 
is accelerated. In other words, from an algorithmic point of view, the non- 
reversible algorithm performs better. The analysis is related to the study of 
antisymmetric perturbations of self-adjoint infinitesimal generators. 
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Our investigation is motivated by the following consideration. High- di- 
mensional probability distributions appear frequently in applications. To 
sample from these distributions directly is not feasible in practice, espe- 
cially when the corresponding densities are known up to normalizing con- 
stants only. One has to resort to approximations. A Markov process with 
the underlying distribution as its equilibrium is often used to generate an 
approximation ("MCMC"). How good the approximation is depends on the 
approximating Markov process and on the specific criterion used for com- 
parison. One may investigate the convergence properties of some particular 
Monte Carlo Markov processes, or compare the convergence rate within a 
family of Markov processes (with the same equilibrium) w.r.t. different crite- 
ria, or even try to find optimal solutions in that family. Mathematical prob- 
lems arising from this approach are challenging. Related works may be found 
in Amit (1991), Amit and Grenander (1991), Frigessi, Hwang and Younes 
(1992), Frigessi, Hwang, Sheu and di Stefano (1993), Hwang, Hwang-Ma 
and Sheu (1993), Amit (1996), Athreya, Doss and Sethuraman (1996), Gilks 
and Roberts (1996), Mengersen and Tweedie (1996), Stramer and Tweedie 
(1997), Chang and Hwang (1998), Hwang and Sheu (1998, 2000) and Roberts 
and Rosenthal (2004). 

Here we concentrate on the diffusion case. Let U be a given real-valued 
function defined in M. d satisfying some smoothness conditions. The underly- 
ing distribution ir is assumed to have a density proportional to exp— U(x). 
The following diffusion is commonly used for sampling from its equilibrium 

7T, 

(1) dX(t) = -VU(X{t))dt + y/2dW(t), X(0) = x , 

where W(t) is the Brownian motion in M. d . For convenience, 7r will be used 
to denote the underlying probability measure, as well as its probability den- 
sity. For applications one may consult Grenander and Miller (1994), Miller, 
Srivastava and Grenander (1995), Srivastava (1996) and references therein. 

If a diffusion is regarded as a useful approach to sampling, then it is 
natural to consider a family of diffusions with tt as their common equilibrium: 

(2) dX{t) = -VU{X(t))dt + C{X{t))dt + V2dW{t), X(0) = x , 

under suitable conditions on C(x). Roughly speaking, the conditions are that 
div(C(:r) exp — U(x)) = and there is no explosion in (2), that is, |AT(t)| does 
not tend to infinity in a finite time. A strict definition of explosion can be 
found on page 172 of Ikeda and Watanabe (1989). Exact conditions will be 
spelled out later. It is easy to pick such a C. For example, C(x) = S(VU(x)), 
for any skew symmetric matrix S. We are interested in how C(x) influences 
the convergence of the diffusion (2) to equilibrium. 

Hwang, Hwang-Ma and Sheu (1993) focused on a special case, the study 
of a family of Gaussian diffusions where 2U(x) = {—Dx) ■ x,—'VU(x) = 
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Dx,C(x) = SDx, and where D is a strictly negative-definite real matrix and 
S is any skew symmetric real matrix. In this case, it{x) is Gaussian with 
mean and covariance matrix —D~ 1 and X(t) is an Ornstein-Uhlenbeck 
process with drift (D + SD)x. Using the rate of convergence of the covari- 
ance of X(t) [or together with EX{t)\ as the criterion, the reversible diffusion 
with drift Dx (i.e., C = 0) is the worst choice and the optimal solution is 
obtained in this setup. 

If C(x) is not zero, then the corresponding diffusion, regarded as a Markov 
process, is nonreversible. In general, it is difficult to analyze nonreversible 
processes. We just cite some related works in different settings. In Geman 
and Geman (1984), Amit and Grenander (1991) and Hwang and Sheu (1998) 
the convergence properties of some nonreversible Gibb samplers are studied. 
The ergodicity of systematic sweep in stochastic relaxation, again nonre- 
versible, is investigated in Hwang and Sheu (1992). 

Two comparison criteria are considered here. Basic questions such as the 
acceleration of convergence and the consistency of the comparison w.r.t. 
these two criteria are answered. Related problems will be discussed in the 
last section. 

Let || • || p and || • || p — >.g denote the norm in L p (ir) and the operator norm 
from L p (ir) to L q (ir), respectively, 1 < p, q < oo. For p = q = 2, both norms 
are simply denoted by || • ||. Let Lc denote the infinitesimal generator of the 
diffusion X(t) from (2) and, for C = 0, let L = L . Let T(t) = e tLc denote 
the corresponding semigroup, 



where p(t,x,y) is the transition density if it exists. Note that the index C is 
suppressed from T(t) and p(t,x,y) for the sake of brevity. 

We define now the spectral gap of Lc in L 2 (tt) as the first comparison 
criterion. Since E x f(X(t)) — > vr(/) for any starting point x, one may consider 
the average case formulation by averaging the difference (E x f(X(t)) — 7r(/)) 2 
over the starting point w.r.t. tt: 



for some A less than or equal to 0, where 7r(/) means integration of / w.r.t. tt. 
Now consider the worst-case analysis over /, then \\T(t) — tt\\ < constant e xt . 
The infimum over such A's indicates the convergence rate. This shows that 
the spectral radius of T(l) in the space {/ G L 2 (tt), 7r(/) = 0} is a measure of 
convergence rate of diffusions to equilibrium. Furthermore, the weak spectral 
mapping theorem holds between Lc and e tL ° [Nagel (1986), page 91]. Hence, 
the spectral gap of Lc in L 2 (tt) defined by 

(4) A(C) = Supjreal part of fi : p, in the spectrum of Lc, p ^ 0} 




(3) 




< constant ||/ - 7r(/)|| 2 e : 



2„2\t 
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is a good candidate to serve as a criterion for the comparison of convergence 
rates. 

The constant in (3) may depend on C. If instead we reformulate the 
inequality in (3) without the constant term, 

||T(t)/-7r(/)||<||/-7r(/)||e A <, 

for some A, then the inequality depends only on the behavior of the process 
around time and the rate will be the same regardless of perturbations [Chen 
(1992), page 312]. Our interest here is instead in the large-time behavior. 

We will always assume that there is no explosion for the diffusions un- 
der consideration. Sufficient conditions for nonexplosion may be found, for 
example, in Proposition 1.10 of Stannat (1999). Since the existence of the 
transition density is needed in Section 2, for simplicity, we assume that the 
following assumption holds throughout this paper, 

C and VC/ are in L}(tt) n L[ oc (tt) for some I > d; 
( Al ) for /GC °° J(C- V/)tt = 0. 

Under (Al) there is no explosion in the diffusion (2) and the transition 
density exists with n as its equilibrium distribution [Stannat (1999) and 
Bogachev, Krylov and Rockner (2001)]. For / G Cq°,J(C • V/)tt = means 
that C is weakly weighted divergence free. This is essential for ir to be an 
invariant measure. 

Intuitively Lq is a perturbation of a self-adjoint operator L by an anti- 
symmetric operator C • V in L 2 (ir). We are interested in how the spectrum 
changes. Note that, in general, this perturbation is neither small nor rel- 
atively compact. For general references, refer to Kato (1995) and Yosida 
(1980). Lc is not self-adjoint for nonzero C. The spaces considered are real 
vector spaces of real functions. However, for spectral analysis, one has to 
consider complex vector spaces. We will make the distinction when it is 
necessary. Let C+ denote Lc — L and C_ denote L_c — L. 

We assume that the reversible diffusion (1) w.r.t. it has an exponential 
convergence rate. Equivalently, L has a spectral gap in L 2 (7r), that is, 

(A2) A(0) < 0. 

The existence of a spectral gap for self-adjoint L has been studied exten- 
sively, for example, see Wang (1999). 

Under the above two assumptions we prove that A(C) < A(0). Further- 
more, if A(0) is in the discrete spectrum of L, then the equality holds only 
in some rare situation which is characterized completely. These results are 
in Theorem 1. 
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Note that the exponential convergence rate assumption is imposed only on 
the reversible diffusion. As a consequence of Theorem 1, the perturbed dif- 
fusion (2) has a better exponential convergence rate. In other words, adding 
an extra drift accelerates convergence. 

For the nonexplosion of (1), (A2) and A(0) in the discrete spectrum of L 
to all hold, the following is a sufficient condition [Reed and Simon (1978)]: 

(5) l/2\VU{x)\ 2 - AU{x) — > oo as \x\ —> oo. 

From a probabilistic point of view, one may consider the rate of conver- 
gence of p(t,x,y) to 7r in variational norm as a comparison criterion. The 
variational norm of two probability measures is defined as the supremum 
of the difference between the two probabilities over all events. This may 
be regarded as some kind of worst case analysis. Note that the variational 
norm equals one half of the L 1 (dy) distance between the two corresponding 
densities. Hence, p{C) defined below is used as a comparison criterion, 

(6) p(C)=Inf|p: J \p(t,x,y)-7r(y)\dy<g(x)e' lt y 

g(x) may depend on C. Usually g is assumed to be essentially locally 
bounded or locally integrable w.r.t. n. It needs further study for unrestricted 
g. We prove in Theorems 4 and 5 that p(C) < A(C) and equality holds for 
the reversible case. Again, using p(C) as the comparison criterion, adding 
an antisymmetric perturbation does help. This result is consistent with the 
previous one. 

It is not clear how the perturbations affect p(C) directly. We compare 
p{C) and p(0) via A(C) and A(0). 

We study the above two criteria only. However, we make the following 
remarks without giving proofs. Since T(t) is a contractive semigroup in 
L p (tt), for 1 < p < oo, one may consider (3) in terms of the LP norm. For a 
fixed C, consider the dependence of the convergence rate on p. Note that 
when (1) is an Ornstein-Uhlenbeck process, \\T(t) — 7r||i_»i does not have 
exponential convergence rate despite the fact that the corresponding L has a 
spectral gap in L 2 (tt). For the reversible case, the i 1 (vr) to L (tt) exponential 
convergence rate is equivalent to the essentially uniform boundedness of g(x) 
in (6) [Chen (2002)]. If \\T(t) — 7r|| p _>p has exponential convergence rate for 

some p>l and ||r(l)||p_ > ( p+1 ) is bounded, then \\T(t) — 7r 1 1 ^ has the same 

exponential convergence rate for all q>p. 

The use of A(C) as the comparison criterion is studied in Section 2. In 
Section 3 p(C) is the criterion. The relationship between A(C) and p(C) is 
studied. Discussion and related problems are presented in Section 4. 
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2. Spectral gap as comparison criterion. If A(0) is in the discrete spec- 
trum of L in L 2 (tt), then by definition its corresponding eigenspace, denoted 
by M, is finite dimensional. Let D(-) denote "the domain of." Define 

£(/,<?)= /(V/-V 5 )7r, /,<7eC°°. 

Then e is closable in L 2 (-k). In this section our analysis assumes 7r(/) = 
0,/gL 2 (tt). 

Theorem 1. // (Al) and (A2) hold, then A(C) < A(0). Furthermore, if 
A(0) is in the discrete spectrum of L, then equality holds if and only if C + 
or C~ leaves a nonzero subspace of M invariant. 

The following inequality from Stannat [(1999), page 124] will be used 
repeatedly in the proof. 

(7) If / G D(L C ) then / G D(e) and e(f, f)<-j (L c f)fn. 

We prove first that A(C) < A(0). For / with ||/|| = 1 and tt(/) = 0, let 
9 (t) = ||T(t)/|| 2 . 5 (0) = landby(7), 

g'(t) = 2 J (L c T(t)f)(T(t)f) vr < -2e(T(t)f,T(t)f) < 2X(0)g(t). 

The above differential inequality implies that the operator norm ||T(t)|| in 
the space {/:/ G L 2 (ir),ir(f) =0} is less than or equal to e A( - ^. Hence, 
A(C)<A(0). 

For a complex valued function /, let f r and f l denote the real and purely 
imaginary parts of /, respectively. 

Lemma 2. If A(0) is in the discrete spectrum of L, then there exists a 
5 > such that for any a with A(0) — 5 < a < A(0) and any b, a + ib is not 
in the continuous spectrum of Lc ■ 

Proof. Since A(0) is in the discrete spectrum of L, there exists 5 > 
such that the spectrum of L restricted to the orthogonal complement of M 
is contained in (-co, A(0) — 25). 

We prove by contradiction. Assume that there are a, b with A(0) — 5 < a < 
A(0) such that (a + ib) is in the continuous spectrum of Lc- Let Lc — (a + ib) 
be denoted by A. Then A is one-to-one, the range of A is dense, and A -1 
is not continuous. To arrive at a contradiction, it suffices to show that for 
bounded {/„}, Af n — > implies /„ ->• 0. 

First we show that f n — > weakly. Af n — ► 0, the domain of A* (the adjoint 
of A) being dense, and the boundedness of {f n } imply the weak convergence 



ACCELERATING DIFFUSIONS 



7 



of f n to zero. We claim that > Iimsup(e(/„, /„) + a||/ n || 2 ). A(f n ) = {{L c - 
a ) fn + h fn) +i(( L C ~a)fn ~ h fn))- Since A (fn) -> and {f n } is bounded, the 
real part of the inner product of A(f n ) and f n , vr(/;;(L c )/;;) +n(f n {L c )f n ) - 
a ll/n|| 2 ; goes to zero. By (7), the claim is proved. 

Let /nl be the projection of /„ onto M, f n ,2 the orthogonal complement. 
f n converges weakly, and so do f n> \ and / n ,2- Since M is finite dimensional, 

fn,l - 0, 

> limsup(e(/ n ,/„) + a||/ n || 2 ) = limsup(e(/ nj2 , f n>2 ) + a||/ ni2 || 2 ) 
> limsup(-A(0) + 25 + a)||/n, 2 || 2 > <51imsup ||/„, 2 || 2 . 
Therefore, f n — ► 0. □ 

Lemma 3. If A(C) = A(0), then there exists b such that A(0) + ib is in 
the spectrum of Lc ■ 

Proof. Let {f n } be a sequence of normalized eigenfunctions of Lq with 
corresponding eigenvalues {a n + ib n } such that a n < A(0) and a n — ► A(0). 
Then L c f r n = a n f r n - b n f n , L c f n = a n f n + Kf n and > b Y ( 7 )> 

-a n = -7T(/£L C /n) " <f n Lcf n ) > £(/n, /n). 

As in the last part of the proof of Lemma 2, 

A(0) - a n , > e(/ n , /„) + A(0) = e{f n , 2 J n>2 ) + A(0)||/ n , 2 || 2 > 5\\f n , 2 \\\ 

where f n ,2 and 5 are as in Lemma 2. Hence, / n>2 — > 0. Since the projec- 
tion {/ n ,i} of {f n } onto the finite-dimensional M is bounded, there exists 
a convergent subsequence of {/ n ,i}- For convenience, the same index n will 
be used. We have f n converging to some / in M. Note that the spectral 
mapping theorem holds for point spectrums. Hence, 

e a n +ib nfn = T(1)/n ^ T(1)/ and ^ ^ e -K0) n (f T (l)f). 

Therefore, there exists some b such that A(0) + ib and e x ^ +lb are eigenvalues 
of Lc and T(l) with the same eigenfunction /. If A(0) is a limit point of 
the real parts of the residual spectrum of Lq, then we can repeat the above 
proof for the adjoint of Lc which is Hence, there exists some b such 

that A(0) +ib is in the point spectrum or the residual spectrum of Lc- Since 
there is no continuous spectrum in the neighborhood of A(0), this completes 
the proof. □ 

PROOF of Theorem 1. If A(0) is the real part of an eigenvalue of L c 
with a normalized eigenfunction f + ig, then by (7) and the definition of the 
Dirichlet form s, 



-A(0) >e(/,/) +£(<?,<?) >-A(0). 
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Then / and g are in M and C+ maps the subspace spanned by / and g 
into itself. If for some b, A(0) + ib is in the residual spectrum of Lc, then 
A(0) — ib is an eigenvalue of the adjoint operator L_c- Hence, C_ leaves a 
nonzero subspace of M invariant. 

The proof of the other direction is obvious. □ 

Remark. It seems that a stronger result should hold: if A(C) = A(0), 
then A(0) is the real part of an eigenvalue of Lc- If this is the case, Theorem 
1 has a stronger form: the equality holds iff C+ leaves a nonzero subspace of 
M invariant. If (5) holds, then (L — a) -1 is compact for a in the resolvent 
of L [Reed and Simon (1978)]. And the stronger statements hold. 

Remark. As mentioned in the Introduction, the existence of the tran- 
sition density is not needed here. A weaker assumption than (Al) suffices, 
for example, C and VU are in LV) nLf oc (7r) [Stannat (1999)]. 

3. Convergence rate in variational norm as criterion. Under (Al) the 
transition density p(t, x, y) exists. Let pt(x, y) denote p(t,x,y)/ir(y); p t (x,y) 
is locally Holder [Bogachev, Krylov and Rockner (2001)]. 

Theorem 4. In addition to (Al) and (A2), if VU and C are locally 
bounded, then there exists a locally bounded function g such that 

J \p t (x,y)-l\7T(y)dy<g{x)e^ c)t . 

Moreover, p(C) < A(C). 



Proof. 






(* denotes the adjoint process) 






< constant | |pi (x, •) - l||e A(c)i . 
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The last inequality holds if pi(x, •) is in L 2 (vr). 

We now claim that / p\{x,z)ir{z)dz is locally bounded. Since VU and C 
are locally bounded, by a local Harnack inequality [Theorem 1.1 in Trudinger 
(1968)], 

Vx in R d , ViV > 0,V/ with tt(/) = 1 and / > 0, 
(8) Sup T(s)f(y)<C(N,x) Inf T(2s)f(y), 

y£B(x,N/2) y£B(x,N/2) 

where the constant C(N,x) depends only on N and x, and B(x,N/2) de- 
notes a ball in M. d with center rr and radius N/2. For y and z in B(x,N/2), 

< \fl \ I w ty s f( \ lB( x ,N/2) T (s)f(zMy)dy 
p s (z, u)f(u)ir(u) du = T(s)t(z) = — - — —j — ; — 

N/2 - ) T(2s)f(y)-K(y)dy C(N,x) 



tt{B(x, N/2)) ~ tt(B{x, N/2)) ' 

for / satisfying (8), we have 

C(iV,x) 
Supp, z,y < —— — — — . 
y tt{B(x,N/2)) 

Now let g(x) = Sup y p s (x,y), then g is locally bounded and 

P 2 s {x-,y)n{y)dy <g 2 (x). 

This also establishes that p{C) < A(C). □ 

Remark. The local boundedness assumption in Theorem 4 is not needed 
for the reversible case, since / p\ (x, y)vr(y) dy =p2(x,x) is locally bounded. 

The following theorem implies that for the reversible case, p(0) = A(0). 

Theorem 5. For the reversible case, if there exists some g in L\ oc (ir^ 
such that 

J \p t {x,y)-l\TT(y)dy<g(x)e pt then \\T(t) - n\\ < e pt . 

Proof. For the reversible case, T(t) is self-adjoint in L 2 (tt), 

\\T(t)f\\ 2 = 7r(fT(2t)f). 

For / with vr(/) = 0, / S C°°, f = cq outside Bjy, where B^ denotes a ball 
in M. d centered at with radius iV and cq a constant, 

||T(t)/|| 2 = 7r(fT(2t)f) = tt((/ - c )T(2t)(f - co)) - c\ 
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(f(x)-c )l (p 2 t(x,y)-l)(f(y)-co)ir(y)dy)ir(x)dx 
<ll/- c o|lL/ / \P2t{x,y) - l\Tr(y)dy-Tr(x)dx 

J Bjy J Bj\r 

< \\f-co\L I g(x)e 2 P t 7T(x)dx<C(NJ)e 2 P t . 

J B N 

By Lemma 2.2 in Rockner and Wang (2001), for s < t and vr(/ 2 ) = 1, 

||T( S )/|| 2 < {\\T{t)ff) s/t < C(N,f)'^. 

The equalities hold at s = 0. Now take a derivative w.r.t. s and evaluate at 
0. Then 

-2e(f,f)<l/tlogC(N,f) + 2p. 
Letting t — > oo, we have e(/, /) > —p. 



For any /iSC, 



oo 
) 



iet / = lit ^Sii then /) ^ 

||/i-7r(/i)|| 

and vr(/i 2 ) < -^s(h, h) + 7r 2 (/i). Hence, we have proven A(0) < p. □ 

4. Discussion and related problems. Our theorems give only general and 
qualitative information. The proofs do not reveal how the rate of convergence 
depends on C. Intuitively, multiplying C by a large k should speed up con- 
vergence. However, examples in Hwang, Hwang-Ma and Sheu (1993) show 
the contrary. It is not clear which part of C contributes to acceleration. Most 
of the questions discussed below are based on A(C). Similar questions can 
be formulated for p(C). 

Now consider families of diffusions (algorithms) defined by (2) with index 
C satisfying various conditions. What is the best algorithm within a certain 
family? For example, let G denote the family of diffusions with C satisfying 
the general conditions described in the previous sections and S the family of 
diffusions with C = S(VU) for any skew symmetric matrix S, respectively. 
One may ask for the optimal values and minimizers in the following two 
problems: 

(Al) Inf CeG A(C). 
(A2) Inf CeS A(C7). 

For (Al), even the simple question "Is InfceG A(C) = — oo?" remains unan- 
swered. For Gaussian diffusions, the optimal structure of (A2) is known 
[Hwang, Hwang-Ma and Sheu (1993)]. Note that for the Gaussian case, the 
perturbation C in (A2) is linear. However, Hwang and Sheu (2000) showed 
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that a quadratic C has a better rate. Problem (A2) remains open for the 
general case. 

Basically, the problem is to find the best "spectral gap" in a family of ellip- 
tic operators. Similar questions may be discussed on compact Riemannian 
manifolds. We consider the following generic case on the two-dimensional 
torus: let Lq = A + C ■ V with divergence free C, 

(A3) inf c A(C). 

Again, what is the best solution and is it finite? 

Obviously, A(0) < implies A(C) < 0, but how about the other way 
around? That is, if Lq has a spectral gap, does L? If the answer is neg- 
ative, then perturbations can drastically change fundamental convergence 
properties. We proved that p(C) < A(C), but when does equality hold? If 
there is no spectral gap for L, how does the antisymmetric perturbation 
accelerate convergence? 
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